Divide & Conquer-based Inclusion Dependency Discovery
نویسندگان
چکیده
The discovery of all inclusion dependencies (INDs) in a dataset is an important part of any data profiling effort. Apart from the detection of foreign key relationships, INDs can help to perform data integration, query optimization, integrity checking, or schema (re-)design. However, the detection of INDs gets harder as datasets become larger in terms of number of tuples as well as attributes. To this end, we propose Binder, an IND detection system that is capable of detecting both unary and n-ary INDs. It is based on a divide & conquer approach, which allows to handle very large datasets – an important property on the face of the ever increasing size of today’s data. In contrast to most related works, we do not rely on existing database functionality nor assume that inspected datasets fit into main memory. This renders Binder an efficient and scalable competitor. Our exhaustive experimental evaluation shows the high superiority of Binder over the state-of-the-art in both unary (Spider) and n-ary (Mind) IND discovery. Binder is up to 26x faster than Spider and more than 2500x faster than Mind.
منابع مشابه
Free Vibration Analysis of Repetitive Structures using Decomposition, and Divide-Conquer Methods
This paper consists of three sections. In the first section an efficient method is used for decomposition of the canonical matrices associated with repetitive structures. to this end, cylindrical coordinate system, as well as a special numbering scheme were employed. In the second section, divide and conquer method have been used for eigensolution of these structures, where the matrices are in ...
متن کاملA Divide-and-Conquer Strategy for Parsing
In this paper, we propose a novel strategy which is designed to enhance the accuracy of the parser by simplifying complex sentences before parsing. This approach involves the separate parsing of the constituent sub-sentences within a complex sentence. To achieve that, the divide-and-conquer strategy first disam-biguates the roles of the link words in the sentence and segments the sentence based...
متن کاملKnowledge Reduction Based on Divide and Conquer Method in Rough Set Theory
The divide and conquer method is a typical granular computing method using multiple levels of abstraction and granulations. So far, although some achievements based on divided and conquer method in the rough set theory have been acquired, the systematic methods for knowledge reduction based on divide and conquer method are still absent. In this paper, the knowledge reduction approaches based on...
متن کاملPlan Mining by Divide-and-Conquer
Plans or sequences of actions are an important form of data With the proliferation of database technology plan databases or planbases are increasingly common E cient discovery of important patterns of actions in plan databases presents a challenge to data mining In this paper we present a method for mining signi cant patterns of successful actions in a large planbase using a divide and conquer ...
متن کاملOn Parity based Divide and Conquer Recursive Functions
The parity based divide and conquer recursion trees are introduced where the sizes of the tree do not grow monotonically as n grows. These non-monotonic recursive functions called fogk(n) and f̃ogk(n) are strictly less than linear, o(n) but greater than logarithm, Ω(logn). Properties of fogk(n) such as non-monotonicity, upper and lower bounds, etc. are examined and proven. These functions are us...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 8 شماره
صفحات -
تاریخ انتشار 2015